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Summary. 

Generating multivariate Poisson data is essential in many applications. Current simulation 
methods suffer from limitations ranging from computational complexity to restrictions on the 
structure of the correlation matrix. We propose a computationally efficient and conceptually 
appealing method for generating multivariate Poisson data. The method is based on simulat- 
ing multivariate Normal data and converting them to achieve a specific correlation matrix and 
Poisson rate vector. This allows for generating data that have positive or negative correlations 
as well as different rates. 



1. Introduction 

The most common multivariate distribution in the statistics hterature is the multivariate 
Normal {Gaussian) distribution. Generating multivariate Normal data is relatively easy 
and fast. It has been used for many purposes and in a vast number of applications. In 
many applications, however, the multivariate data that arise are not well approximated by 
a multivariate Normal distribution. Low count data are one example. Highly skewed data 
are another. In such cases the process generating the data is better described by a Poisson 
process and the resulting data by a multivariate Poisson distribution. Such data arise in 
many disciplines: In healthcare, for example, multivariate Poisson data arise when looking 
at multiple series of patient arrivals, each related to a different disease (or at different 
hospitals within the same geographic area). In marketing and management science (e.g., 
pricing, revenue management) the multivariate Poisson distribution can be used to simulate 
purchases of complementary products or of substitutes. In the latter case, the correlation 
between purchases of substitute products is expected to be negative. 

Apart from the above examples, simulating multivariate Poisson data is also useful for 
different functions su ch as fitting d istributions to multivariate data (fitting to a multivari- 



ate Poisson, see e.g., iKarlisI ( 2QQ3h ) and evaluating performance of statistical models by 



creating simulated data. Previous studies have proposed methods for generating bivariate 
and multivariate Poisson vectors based on the joint probability function. However, such 
methods tend to be too complicated for actual implementation and are therefore rarely 
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used for purposes of simulation. The popular alternative is to use a multivariate Normal 
approximation, which is not always adequate. 

In this paper we present a new method for generating multivariate Poisson vectors 
based on multivariate Normal distribution. The method is elegant in terms of its con- 
ceptual simplicity and its computational efficiency. Yet, it is powerful enough to allow a 
flexible correlation structure, including negative and positive correlations and unequal rates. 
Our method can be considered a generalization of the univariate case by incorporating a 
correlation structure that is based on a multivariate Normal formulation. 

The paper is organized as follows: In Section [2] we describe the classic approach for 
generating multivariate Normal data. In Section [3] we survey existing methods for gener- 
ating multivariate Poisson data and then introduce our new method. Section 2] provides 
a graphical illustration of the Poisson data generated by our method and their properties. 
We conclude in Section [5] 



2. Generating Multivariate Normal Data 

We consider the common p-variate Normal distribution with mean jH and covariance matrix 
S (denoted A ^(/i, H)) . Simu lating approaches for multivariate Normal vector have been ffist 
ad dressed by iRiplevI ITqStI) . 



IressecL by IKiplevI 119^71) . 

iHerndvlg vi (1998) introduced the following algorithm for generating a p-dimensional 
Normal vector, based on the Central Limit Theorem: 

(a) Generate X, such that X = 12/7(0, 1) — 6, where /7(0, 1) denotes a vector of uniform 
(0,1) variates. According to the Central Limit Theorem X is approximately A/'(0, /). 

(b) Let Q be the pxp matrix whose columns are the normalized eigenvectors of S, and A 
be a diagonal matrix whose diagonal entries are the eigenvalues of S. Let Q = A^Q. 
Then, f = QX + /i - N{fl, S) 

This algorithm is used in the software R for simulating multivariate Normal data. In 
general, most standard software for statistical computing (e.g., R, Matlab) have a function 
for simulating multivariate Normal data. 



3. Generating Multivariate Poisson Data 

The p-dimensional Poisson distributions is characterized by a mean (or rate) vector A and 
a covariance matrix T^pois that has diagonal elements equal to A. 

Sampling from multivariate Poisson distribution has been addressed massively in the 
literature, with a major focus on the bivariate case. It is customary to use the term "multi- 
variate Poisson" for any extension of the univariate Poisson distribution where the resulting 
marginals are of univariate Poisson form. In other words, the same term is used to describe 
different multivariate distributions, which have in common the property that their marginals 
are univariate Poisson. 

One of the earliest simulation methods was proposed bv lKrummenauerl (|l998l . [l999l ). The 
algorithm first generates and then convolves independent univariate Poisson variates with 
appropriate rates. The author presented a recursive formula to carry out the convolution 
in polynomial time. This method enables the simulation of multivariate Poisson data with 
arbitrary covariance structure. The main limitations of this method is its high complexity 
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(the recursions become very inefficient when p increases). Also, the method does not support 
negative correlation. 



Minhajuddin et al.l (|2QQ4l ) presented a method for simulating multivariate Poisson data 



based on the Negative Binomial - Gamma mixture distribution. First, a value k is generated 
from a Negative Binomial distribution with rate r and 11 = Then, conditional on /c, 

a set of p independent Gamma variates are generated (Xi, . . . The sum of the joint 

distribution of k and Xi, . . . ,Xp has a Gamma marginal distribution with rates r and A. 
The correlation between a pair Xi and Xj (i 7^ j) is There are two main drawbacks to 
this approach: First, it requires the correlation between each pair of variates to be identical 
(Pi i = P for all i 7^ j). Second, it does not support a negative correlation. 

^ Karlisl(|2003 h points out that the main obstacle limiting the use of multivariate simulation 



methods for Poisson data, including the above-mentioned methods, is the complexity of 
calculating the joint probability function. He mentions that the required summations might 
be computationally e xhausting in some ca ses, especially when the dimension p is high. 
In a recent paper [Sh in and Pasupathvl ([2007) present a fast method of generatin g a bi- 



variate Poisson process with negative correlation. Similar to the approach of lMinhajuddin et al 



(|2004f ). their methods is based on generating two dependent random variables from three 
independent random variables. The authors propose a computationally fast modification to 
the trivariate reduction that enables generating a bivariate Poisson with a specified negative 
correlation. 

In this paper, we propose an alternative approach for generating a p-dimensional Poisson 
vector with covariance matrix T^pios and rate A. The basic idea is first to generate a p- 
vector from the multivariate Normal distribution and then to transform it to Poisson using 
the Gaussian and Poisson cumulative distribution functions (CDFs). We use the following 
notation for these CDFs, respectively: 

H^) = J 2 du (1) 

Our algorithm proceeds as follows: 

(a) Generate a p-dimensional Normal vector X^ with mean /I = 0, variance a = 1 and a 
correlation matrix . 

(b) For each value X/^, z G 1, 2, ...p, calculate the Normal CDF: 

(c) For each <I>(X^^), calculate the Poisson inverse CDF (quantile) with rate A^: 

Xf°- = 5-i($(Xf)) 



The vector X^^** is then a p-dimensional Poisson vector with correlation matrix R^^'^^ 
and rates A. 

Theorem: The vector X^^** is a p-dimensional vector with Poisson marginals. 
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Proof: From the properties of multivariate Normal distribution, is a p-dimensional 
vector with Normal marginals. Therefore, by transforming the marginals using the Normal 
CDF we obtain a p-dimensional vector ^{X^) with uniform marginals. 

We then transform the uniform variates to Poisson variates using the Poisson inverse 
CDF, and thus obtain p-dimensional Poisson marginals. 

In the next sections, we experimentally show that the resulting correlation structure 
j^Pozs stochastically equals the desired correlation structure . 



3. 1. Implementation Issues 

The first two steps are easy to implement (numerical procedures for computing the Normal 
CDF are computationally efficient and widely available). The difficulty arises in the last 
step, in computing the Poisson inverse CDF, because of the factorial in the denominator 
of S (see equation ([2])). Solutions are as follows: For small values of A (e.g., A < 10) 
Poisson variates can be efficiently simulated by g enerating expon ential (inter-arrival) ran- 
dom variates and then s umming them (see, e.g., iDevrove ( 19861 )). For larger values of A, 
|Ah reus and Dieted (|l982t) proposed an algorithm for generating univariate Poisson variates 
by modifying Normal variates. An alternative is to approximate the factorial in equation 
([2]) using the Sterling approximation. With today's computational power, this is a feasible 
possibility. 



4. Graphical Illustration 

We have not proved that the operations in steps (2)- (3) of our method maintain the original 
correlation structure (i.e., that ($(•)) maintains (R^). We next show, in a series of 
experiments, that this indeed is the case. In particular, we show that the multivariate 
Poisson data that are generated by our method have Poisson marginal distributions and a 
correlation structure equal to the one specified in step (1). 

We use the software R to generate the data, with the functions mvrnorm to generate 
multivariate Normal vectors, pnorm for the Normal CDF and qpois for the Poisson inverse- 
CDF. The code in given in the Appendix. 

For each of the cases in sections 4.1-4.2 below we generate 50,000 3-dimensional Poisson 
variates with rate vectors A = {Ai,A2,A3} and correlation matrix R. When the variance 
is different from 1, we provide the covariance matrix S which has elements: Vi, j G p : 
Cov{Xi, Xj) = pijCJiCjj^ where pij is the correlation between xi and Xj. 



4. 1. Constant rate and equal correlations 

We start with the simple case of a multivariate Poisson distribution with equal pairwise 
correlations {p = 0.4) and a constant rate vector (Ai = A2 = A3 = 2). In other words, we 
generate 3-dimensional Poisson data with the following correlation and covariance matrices: 



2 0.8 0.8 
S = I 0.8 2 0.8 
0.8 0.8 2 



R = 



1 

0.4 
0.4 



0.4 
1 

0.4 



0.4 
0.4 
1 



(3) 
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Figure [T] compares the initial multivariate Normal data (left panel) and the resulting 
multivariate Poisson data (right panel). Each panel contains scatter plots for each of the 
three pairs of series (bottom-left), the estimated correlation coefficients (upper-right), and 
histograms of the marginal distributions (diagonal). We see that the Normal and Poisson 
data exhibit Normal and Poisson marginals, respectively, and that both datasets share 
the same correlation structure, as desired. To further examine the fit of each of the three 
marginals to a Poisson distribution we provide scatter plots of the marginal counts compared 
to their expected values under a Poisson distribution in Figure [21 





Fig. 1. Comparing the simulated Normal (left) and Poisson (right) multivariate data with correlation 
matrix given in equation[3] 



4.2. Non-constant rate, non-constant correlations, and negative correlations 

We next examine the case of a non-constant Poisson rate vector, unequal pairwise correla- 
tions, and negative correlations. All tests applied to the simulated data indicate that the 
pre-specified correlation matrix is maintained and that the marginal distributions are each 
Poisson. For sake of brevity we present only results for one configuration, but the same 
results were obtained for a wide range of vector rates and correlation matrices. In this test 
we set A = {5, 10, 15} and 



/ 5 -2.83 3.33 \ / 1 -0.4 0.4 \ 

E= -2.83 10 4.9 R=i -0.4 1 0.5 (4) 

\ 3.33 4.9 15 y \ 0.4 0.5 1 / 



The simulated 3-variate Poisson data are shown in Figure [3] and the goodness of fit to 
a Poisson distribution is shown in the observed vs. expected scatter plots in Figure [H We 
see that the estimated correlations are very close to those in the pre-specified R (including 



6 Yahav and Shmueli 



V1 



V2 




n \ \ \ \ r 
5000 15000 25000 

Univariate marginals 




n \ \ r 
15000 25000 

Univanate marginals 



V3 




n \ \ \ \ r 

5000 15000 25000 

Univanate marginals 



Fig. 2. Comparing the marginal counts to Poisson expected values 



the negative correlations) and that each of the marginal distributions closely fits a Poisson 
distribution. 



4.3. Multivariate Poisson with Low Rates 

Finally, we examine our approach for low rates (A < 4). It is well known that the Poisson 
distribution with high rates (A > 5) is approximately a Normal distribution, and we thus 
expect the actual correlation of the multi variate Poisso n random variables to match the 
desired correlation. However, as shown in IWhitt) (Il976l), the feasible correlation between 
two random Poisson variables is no longer in the range [—1,1], but rather \min_corr = 
corr C^ \^ (U)i'^\^ ('^ — U))i max_corr = corrCB^^ {U)^S^^ (U))]. Infact JShin and Pasupathv 
(|2QQ7l ) show that when Ai, A2 ^0 the minimum feasible correlation miri-corr 0. There- 
fore, our transformation maps a correlation range of [—1, 1] (multivariate normal) to a much 
smaller range [miri-corr > —l^max-corr < 1]. Figure [5] illustrates the relationship between 
the desired correlation and the resulting actual correlation when generating bivariate Pois- 
son data with low rates (A < 1). 

We find that the relationship between the desired correlation and the actual correlation 
can be approximated by an exponential form: 



R 



Pois 



axe 



where the coefficient a, h and c can be estimated from the points (min _corr, -1), (max _corr. 
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Fig. 3. Test 2: Simulating multivariate Poisson data with correlation matrix given in equationH 



1) and (0, 0): 



max_corr x miri-corr 

a = 

max.corr + min.corr 

, , max-corr + a , 
b log[ ) 



Figure [6] illustrates the simulation performance when using the above ap- 
proximation to correct for the distortion in the resulting correlation. This 
is illustrated for the bivariate Poisson case with rates that range (Ai,A2) G 
(0.1, 0.1), (0.1, 0.5), (0.5, 0.5), (0.5, 0.9), (0.9, 0.9). The absolute difference between the ac- 
tual and desired correlation is less then 0.05. 



5. Conclusions 

Simulating multivariate Poisson data is essential in many real-world applications in a wide 
range of fields such as healthcare, marketing, management science, and many others. Cur- 
rent simulation methods suffer from computational limitations and restrictions on the cor- 
relation structure, and therefore are rarely used. 

In this paper we propose an elegant way to generate multivariate Poisson data based on 
a multivariate Normal distribution with a pre-specified correlation matrix and Poisson rate 
vector. Because multivariate Normal and univariate Poisson simulators are implemented in 
many standard statistical software packages, implementing our method requires only a few 
lines of code. 

We show that our method works well for different correlation structures (both negative 
and positive; and varying values) and for non-constant rates. We show that the method is 
highly accurate in terms of producing Poisson marginal distributions and the pre-specified 
correlation matrix. 
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Fig. 4. Comparing the Poisson marginals to tine expected values 
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Fig. 5. Comparing the desired correlation to the resulting actual correlation for low rates 
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Fig. 6. Comparing the desired correlation to the corrected actual correlation 

APPENDIX: Generating Multivariate Poisson Data in R 

# Generate a p-dimensional Poisson 

# p = the dimension of the distribution 

# samples = the number of observations 

# R = correlation matrix p X p 

# lambda = rate vector p X 1 
GenerateMultivariatePoisson<-function(p, samples, R, lambda) { 

normal_mu=rep(0, p) 

normal = mvrnorm ( samples , normal_mu, R) 
pois = normal 
p=pnorm (normal ) 

for (s in 1 :p){pois [, s] =qpois(p [, s] , lambda [s])} 
return(pois) 

} 

# Correct initial correlation between a 

# certain pair of series 

# lambdal = rate of first series 

# lambda2 = rate of second series 

# r = desired correlation 
CorrectlnitialCorreK-functiondambdal , lambda2, r){ 

samples=500 

u = runif (samples , 0, 1) 
lambda=c (lambdal , lambda2) 

maxcor=cor (qpois(u, lambdal), qpois(u, lambda2)) 
mincor=cor (qpois(u, lambdal), qpois(l-u, lambda2)) 
a=-maxcor*mincor/ (maxcor+mincor) 
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b=log( (maxcor+a)/a, exp(l)) 
c=-a 

corrected=log( (r+a)/a, exp(l))/b 
corrected=if else ( (corrected>l I corrected< (-1) ) , 
NA, corrected) 

return (corrected) 



